Skip to content

🤖 feat: update Gemini Flash to Gemini 3.5 Flash#3334

Merged
ThomasK33 merged 9 commits into
mainfrom
model-updates-f887
May 20, 2026
Merged

🤖 feat: update Gemini Flash to Gemini 3.5 Flash#3334
ThomasK33 merged 9 commits into
mainfrom
model-updates-f887

Conversation

@ThomasK33
Copy link
Copy Markdown
Member

Summary

Updates the curated Gemini Flash slot so the stable \ alias now resolves to , with matching local metadata, docs, and provider thinking controls.

Background

Gemini Flash is a stable user-facing alias in Mux. The new Gemini 3.5 Flash release should be the first-class Flash target without adding a separate curated preview entry for the older Gemini 3 Flash Preview model.

Implementation

  • Repointed \ to \ while keeping the existing \ alias.
  • Added local token/capability metadata for Gemini 3.5 Flash in .
  • Added a narrow Gemini Flash thinking-policy helper shared by policy and Google provider options.
  • Mapped Mux \ to Google \ for Gemini 3.5 Flash, while preserving \ / \ / \ with thoughts included.
  • Regenerated model docs and built-in skill content.

Validation

  • \bun test v1.2.15 (df017990)
  • \
  • Dogfooded in a dev-server sandbox with provider config copied from : selected Gemini 3.5 Flash, sent a prompt, and received a successful Gemini 3.5 Flash response.

Risks

Low-to-moderate risk, scoped to model selection, model metadata, and Google thinking options. Existing Gemini 3.1 Pro behavior is covered by tests and left unchanged.


📋 Implementation Plan

Plan: Repoint Gemini Flash to Gemini 3.5 Flash

Decision

Use Option A: update the existing first-class Flash slot so gemini-flash tracks the latest Flash tier.

  • Replace the curated Gemini Flash model ID from google:gemini-3-flash-preview to the Gemini 3.5 Flash API model ID after verifying the exact ID from Google API/AI Studio (gemini-3.5-flash is the likely ID, but the implementer must confirm against an API model list or official developer docs before committing metadata).
  • Keep gemini-flash as the stable user-facing alias.
  • Do not add a separate first-class selector entry for gemini-3-flash-preview unless verification shows the old preview must remain curated for compatibility.

Recommended approach net product LoC estimate: ~45–75 LoC if local models-extra.ts metadata is needed; ~20–35 LoC if bun scripts/update_models.ts now pulls complete LiteLLM metadata. This excludes tests, docs, and generated models.json churn.

Evidence and constraints

  • Current curated model registry is src/common/constants/knownModels.ts; KNOWN_MODELS, aliases, tokenizer overrides, and selector built-ins derive from MODEL_DEFINITIONS.
  • Current Gemini entries:
    • GEMINI_31_PROgoogle:gemini-3.1-pro-preview, aliases gemini, gemini-pro.
    • GEMINI_3_FLASHgoogle:gemini-3-flash-preview, alias gemini-flash.
  • Prior Gemini history supports this alias policy:
    • Gemini 3.1 Pro replaced the earlier Pro entry and kept bare aliases on latest Pro.
    • Gemini Flash alias was normalized to gemini-flash, implying it should track latest Flash.
  • Current src/common/utils/tokens/models.json probe found gemini-3-flash-preview, but not gemini-3.5-flash.
  • src/common/constants/knownModels.test.ts will fail unless the new providerModelId exists in either models.json or models-extra.ts.
  • Current thinking policy is wrong for a gemini-3.5-flash-style ID: includes("gemini-3-flash") misses it, while generic includes("gemini-3") catches it as Pro-style.
  • Google/DeepMind currently describe Gemini 3.5 Flash as available in Gemini API / AI Studio, with 1M input tokens, 64k output tokens, January 2025 knowledge cutoff, multimodal inputs, text output, and tool use including function calling and structured output.

Phase 0 — Verify exact provider facts before editing

  1. Confirm the exact Gemini API model ID from one of:
    • Google AI Studio model picker / API model list.
    • Official Gemini API developer docs if updated.
    • A safe read-only listModels call using a configured Google API key, if available.
  2. Confirm pricing source:
    • Prefer official Gemini API pricing docs if updated for Gemini 3.5 Flash.
    • If official pricing is not yet published in developer docs, either:
      • use verified LiteLLM metadata from bun scripts/update_models.ts, or
      • add conservative local metadata with a comment that it must be revisited once Google publishes official pricing.
  3. Confirm thinking semantics:
    • Gemini Flash family should expose minimal, low, medium, high on the Google API side.
    • Mux should continue exposing user-facing off, low, medium, high, mapping off to Google minimal for Flash models that do not support true thinking-off.

Quality gate: record the exact source used for model ID, limits, pricing, and thinking levels in code comments near local metadata or provider mapping if official docs are incomplete/ambiguous.

Phase 1 — Repoint the curated model registry

Edit src/common/constants/knownModels.ts:

  1. Keep the existing GEMINI_3_FLASH key by default for a minimal Option A diff. Add or update its comment to say it tracks the latest Flash tier. Only rename to GEMINI_35_FLASH if rg "GEMINI_3_FLASH" shows negligible references and the resulting diff is smaller/clearer.

  2. Set providerModelId to the verified API ID, expected:

    providerModelId: "gemini-3.5-flash"
  3. Keep only the stable alias unless product explicitly wants version-specific slash aliases:

    aliases: ["gemini-flash"]

    Users can still select the exact full model string with /model google:gemini-3.5-flash; avoiding a version alias minimizes future cleanup.

  4. Keep tokenizer override unless ai-tokenizer has added a better exact tokenizer:

    tokenizerOverride: "google/gemini-2.5-pro"

Quality gate: run bun test src/common/constants/knownModels.test.ts after metadata work; alias uniqueness and token metadata coverage should pass. Add a targeted alias assertion if not already covered by nearby tests: MODEL_ABBREVIATIONS["gemini-flash"] === "google:<verified-id>" or resolveModelAlias("gemini-flash") === "google:<verified-id>".

Phase 2 — Add or refresh token/capability metadata

Preferred path:

  1. Run bun scripts/update_models.ts before adding manual metadata.
  2. Inspect the generated diff. Keep it only if the churn is acceptable and it adds a bare key for the verified model ID, expected "gemini-3.5-flash", with complete pricing/context/capability fields.
  3. If the refresh only adds provider-scoped keys such as gemini/gemini-3.5-flash, knownModels.test.ts will still fail for a google: known model; add a bare-key fallback in models-extra.ts instead of relying on scoped-only metadata.

Fallback path if LiteLLM is not updated, creates broad unrelated churn, or lacks a bare key:

  1. Add an entry to src/common/utils/tokens/models-extra.ts keyed by the bare provider model ID, expected "gemini-3.5-flash".
  2. Include at minimum:
    • max_input_tokens: 1048576
    • max_output_tokens: 65536
    • input_cost_per_token and output_cost_per_token from a verified pricing source
    • cache_read_input_token_cost only if the verified pricing source confirms context-cache pricing
    • litellm_provider: "vertex_ai-language-models"
    • mode: "chat"
    • supports_function_calling: true
    • supports_vision: true
    • supports_pdf_input: true
    • supports_reasoning: true
    • supports_response_schema: true
    • knowledge_cutoff: "2025-01"
  3. If storing official multimodal support locally, extend the local ModelData interface in models-extra.ts to include:
    • supports_audio_input?: boolean
    • supports_video_input?: boolean

Quality gate: add/adjust src/common/utils/tokens/modelStats.test.ts and src/common/utils/ai/modelCapabilities.test.ts only around behavior that matters: context size, nonzero pricing, and media support. Avoid tautological tests that only repeat static prose.

Phase 3 — Fix Gemini Flash thinking policy and provider mapping

Edit src/common/utils/thinking/policy.ts:

  1. Replace literal substring detection for Flash with a narrow helper that matches only verified chat Flash IDs, for example:

    function isGeminiFlashThinkingLevelModelName(modelName: string): boolean {
      return (
        modelName === "gemini-3-flash-preview" ||
        modelName === "gemini-3.5-flash" ||
        modelName === "gemini-3.5-flash-preview" // only keep if this ID is verified
      );
    }

    Use the helper before the generic Gemini 3/3.1 Pro branch. Avoid a broad regex that accidentally treats gemini-3.1-flash-lite-preview, image, TTS, or other non-chat variants as the same model.

  2. Return Mux levels for verified Flash chat models:

    ["off", "low", "medium", "high"]
  3. Keep Pro behavior separate. If current docs now say Gemini 3.1 Pro supports medium, decide whether to broaden Pro in a separate change; do not conflate that with Gemini 3.5 Flash support unless required by failing tests or verified product behavior.

Edit src/common/utils/ai/providerOptions.ts as a required part of this change:

  1. Reuse the same Flash detection helper, or extract a tiny shared helper, so policy and provider option mapping cannot drift.

  2. The current Google branch sends thinkingConfig.thinkingLevel for capModelName.includes("gemini-3"); gemini-3.5-flash should still enter that branch.

  3. For verified Flash chat models, map Mux off to Google minimal and do not set includeThoughts for that lowest mode unless verified docs require it:

    thinkingConfig = { thinkingLevel: "minimal" };

    Do not rely on omitting thinkingConfig; Gemini 3.5 Flash may default to medium, which would make Mux off misleading.

  4. For Flash low, medium, and high, pass through the level and keep includeThoughts: true:

    thinkingConfig = { includeThoughts: true, thinkingLevel: effectiveThinking };
  5. If xhigh or max somehow reaches provider mapping despite policy enforcement, defensively map to high rather than throwing in the request path. Add a short comment that policy should clamp before provider options, but the provider adapter avoids sending invalid Google values.

Quality gate: extend src/common/utils/thinking/policy.test.ts and src/common/utils/ai/providerOptions.test.ts to prove:

  • google:gemini-3.5-flash gets off/low/medium/high.
  • gateway form like mux-gateway:google/gemini-3.5-flash behaves the same.
  • Optional explicit gateway form like openrouter:google/gemini-3.5-flash behaves correctly if current normalization supports it.
  • Flash off maps to { thinkingConfig: { thinkingLevel: "minimal" } } without includeThoughts unless docs prove otherwise.
  • Flash medium maps to { thinkingConfig: { includeThoughts: true, thinkingLevel: "medium" } }.
  • Gemini 3.1 Pro behavior remains unchanged.
  • Optional custom model mapping: a provider model entry mappedToModel: "google:gemini-3.5-flash" uses Flash mapping for policy/provider options.

Phase 4 — Update docs and generated/model-adjacent outputs

  1. Run or update scripts/gen_docs.ts output so docs/config/models.mdx lists:
    • Gemini 3.5 Flash
    • google:<verified-id>
    • alias gemini-flash
  2. If display output is unexpectedly wrong, add a focused src/common/utils/ai/modelDisplay.test.ts case. The current generic Gemini formatter likely needs no production change, but a dotted-version expectation is cheap if touched nearby.
  3. Search for stale KNOWN_MODELS.GEMINI_3_FLASH references only if the key is renamed. If the key is kept, no reference churn is expected.

Quality gate: do not hand-edit generated docs if an existing generation script owns the table; run the generator and keep only expected diffs.

Phase 5 — Validation

Run targeted tests first:

bun test src/common/constants/knownModels.test.ts
bun test src/common/utils/thinking/policy.test.ts
bun test src/common/utils/ai/providerOptions.test.ts
bun test src/common/utils/tokens/modelStats.test.ts
bun test src/common/utils/ai/modelCapabilities.test.ts
bun test src/common/utils/ai/modelDisplay.test.ts

Then run broader checks:

make typecheck
make fmt-check
make static-check

If bun scripts/update_models.ts produces broad generated churn, inspect whether it is acceptable; if too broad, prefer models-extra.ts for this targeted launch support.

Phase 6 — Dogfooding plan

Because this is a model-selection/provider behavior change, dogfood in the desktop app with a configured Google provider.

  1. Start Mux:

    make dev
  2. In Settings → Providers, confirm Google is configured and enabled.

  3. Use the model selector and confirm:

    • Gemini 3.5 Flash appears.
    • gemini-flash resolves to google:<verified-id>.
    • old Gemini 3 Flash Preview is no longer the curated gemini-flash target.
  4. Send smoke prompts at all Flash thinking levels:

    • off / numeric 0
    • low
    • medium
    • high
  5. Use agent-browser to capture reviewer evidence:

    • Screenshot of the model selector showing Gemini 3.5 Flash.
    • Screenshot of a successful response using gemini-flash.
    • Screenshot of thinking-level control or slash-command usage.
    • Video recording of selecting the model and sending one prompt.
  6. Multimodal smoke check if provider/API key allows it:

    • Attach a small image or PDF and verify the send path is allowed.
    • Capture a screenshot of the attachment flow and successful response.

Acceptance criteria

  • gemini-flash resolves to the verified Gemini 3.5 Flash Google model ID.
  • No new version-specific alias is added unless product explicitly asks for it.
  • The first-class model selector lists Gemini 3.5 Flash when Google/direct or configured gateway routing makes it available.
  • The known-model metadata invariant passes with a bare metadata key for the verified provider model ID in models.json or models-extra.ts.
  • Token meter/context warnings use Gemini 3.5 Flash limits and costs.
  • Gemini 3.5 Flash thinking policy exposes Mux levels off/low/medium/high.
  • Provider options translate Mux off to Google minimal for Gemini 3.5 Flash instead of accidentally using the API default, and omit includeThoughts for this lowest mode unless docs prove otherwise.
  • Provider options pass Flash low/medium/high through with includeThoughts: true.
  • Existing Gemini Pro behavior is unchanged unless explicitly verified and intentionally updated.
  • Docs table reflects Gemini 3.5 Flash.
  • Targeted tests, typecheck, fmt-check, and static-check pass.
  • Dogfooding screenshots and a video recording are captured for reviewer verification.

Risks and mitigations

  • API model ID ambiguity: block implementation until exact ID is verified from official API/AI Studio, not inferred only from marketing copy.
  • Pricing docs lag: prefer LiteLLM refresh if available; otherwise add local metadata with a clear source/revisit comment. Do not commit press/blog-derived pricing unless official API pricing, LiteLLM, or another trusted provider metadata source confirms it.
  • Thinking-level drift: keep tests focused on observed provider behavior, especially offminimal and absence of includeThoughts for the lowest mode unless docs require it.
  • Overbroad Flash matching: use a narrow verified-ID helper so image, TTS, Flash Lite, or future non-chat variants do not inherit chat-model thinking behavior accidentally.
  • Generated metadata churn: if models.json refresh touches many unrelated entries or lacks a bare key, use models-extra.ts for a surgical release.
  • Alias compatibility: existing users selecting google:gemini-3-flash-preview explicitly can still use it as a custom model; only the curated gemini-flash alias changes.

_Generated with [](https://github.com/coder/mux) • Model: \ • Thinking: \ • Cost: _

@ThomasK33
Copy link
Copy Markdown
Member Author

/coder-agents-review

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

@mintlify
Copy link
Copy Markdown

mintlify Bot commented May 19, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
Mux 🟢 Ready View Preview May 19, 2026, 9:42 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well-structured model repoint with good separation of concerns: shared Flash detection helper prevents policy/provider drift, defensive xhigh/max clamping replaces an unsafe type assertion, and Flash off-to-minimal explicitly sends what the old code left to server defaults. Test coverage is thorough, ratio is solid, and the diff is proportional.

Severity count: 3 P2, 4 P3, 1 Nit.

The P2s are a wrong knowledge cutoff (trivial fix, verified against GA-day sources), versioned Flash ID fallthrough in the exact-match Set, and a misleading test name that hides a coverage gap for Pro+off. The P3s are naming debt in two locations plus a comment/test gap.

Pariston tried to break the change and couldn't: "I tried to build a case against this change and could not. The problem is correctly understood across all four framings."

Process note: the commit subject lacks a type prefix (feat(knownModels): repoint gemini-flash alias to Gemini 3.5 Flash or similar would match the PR title convention).

🤖 This review was automatically generated with Coder Agents.

Comment thread src/common/utils/tokens/models-extra.ts
Comment thread src/common/utils/thinking/policy.ts Outdated
Comment thread src/common/utils/ai/providerOptions.test.ts Outdated
Comment thread src/common/utils/ai/providerOptions.ts Outdated
Comment thread src/common/utils/ai/providerOptions.ts Outdated
Comment thread src/common/utils/ai/providerOptions.test.ts
Comment thread src/common/utils/thinking/policy.ts
Comment thread src/common/constants/knownModels.ts Outdated
@ThomasK33
Copy link
Copy Markdown
Member Author

Addressed coder-agents-review findings:

  • DEREM-3: Updated Gemini 3.5 Flash metadata knowledge_cutoff to 2026-01.
  • DEREM-4: Broadened Gemini 3.5 Flash detection to cover versioned IDs such as gemini-3.5-flash-001, -latest, and -preview, while excluding gemini-3.5-flash-lite from that prefix match.
  • DEREM-5: Renamed the misleading Gemini 3.1 Pro test and added explicit Pro+off coverage for the no-provider-thinking-config path.
  • DEREM-6: Generalized the provider-options comment to describe Gemini Flash chat models rather than only Gemini 3.5 Flash.
  • DEREM-7: Renamed the Google control-flow boolean to usesGeminiThinkingLevelConfig to reflect the actual provider-option behavior.
  • DEREM-8: Added direct coverage for mux-gateway:google/gemini-3.5-flash with off mapping to minimal.
  • DEREM-9: Added a doc comment to the exported Gemini Flash thinking helper documenting its expected bare-model input and thinkingLevel contract.
  • DEREM-1: Renamed the known-model key from GEMINI_3_FLASH to GEMINI_FLASH.

Validation rerun after these fixes:

  • bun test src/common/constants/knownModels.test.ts src/common/utils/thinking/policy.test.ts src/common/utils/ai/providerOptions.test.ts src/common/utils/tokens/modelStats.test.ts src/common/utils/ai/modelCapabilities.test.ts src/common/utils/ai/modelDisplay.test.ts
  • make static-check

@ThomasK33
Copy link
Copy Markdown
Member Author

/coder-agents-review

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7c4505dd4e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/common/utils/thinking/policy.ts Outdated
Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 8 R1 findings (3 P2, 4 P3, 1 Nit) addressed in a single clean fix commit. Each fix targets the root cause: versioned ID detection uses prefix matching instead of adding one more entry to the Set, the mislabeled test was both renamed and supplemented with the missing Pro+off coverage, and naming changes reflect actual semantics rather than implementation details. Fix-to-finding ratio is 1:1 with no scope drift.

R2 panel (9 reviewers): 6 found no new issues; 3 raised minor gaps (3 P3, 3 Nit). The P3s are untested branches in the new code: the legacy Flash model's behavior change, the flash-lite exclusion guard, and a JSDoc that omits the new Flash branch. The Nits are polish on the new doc comment, the now-single-entry Set name, and missing source URL in metadata.

Pariston again tried to break the change: "I tried to build a case against this change and couldn't. The problem is correctly understood, the solution is proportional, and the fix is at the right level."


src/common/utils/thinking/policy.ts:53

P3 [DEREM-12] getThinkingPolicyForModel JSDoc rules list (lines 42-58) enumerates every model branch but omits Gemini Flash entirely. The code has a dedicated Flash branch at line 112-114 returning ["off", "low", "medium", "high"], but the JSDoc only shows gemini-3 → ["low", "high"]. Someone reading the doc to understand Flash levels gets the Pro policy instead.

Add a rule line: * - gemini-3.5-flash (and gemini-3-flash-preview) → ["off", "low", "medium", "high"] (Flash thinking levels)

(Leorio)

🤖

🤖 This review was automatically generated with Coder Agents.

Comment thread src/common/utils/ai/providerOptions.test.ts
Comment thread src/common/utils/thinking/policy.ts
Comment thread src/common/utils/thinking/policy.ts Outdated
Comment thread src/common/utils/thinking/policy.ts Outdated
Comment thread src/common/utils/tokens/models-extra.ts
@ThomasK33
Copy link
Copy Markdown
Member Author

Addressed the Codex finding about versioned Gemini 3 Flash Preview IDs:

  • Extended the Gemini Flash thinking-level matcher to also accept gemini-3-flash-preview-* versioned/pinned IDs.
  • Added policy coverage for gemini-3-flash-preview-20251217 and gemini-3-flash-preview-latest.
  • Added provider-options coverage proving versioned Gemini 3 Flash Preview off maps to Google thinkingLevel: "minimal" without includeThoughts.

Validation rerun:

  • bun test src/common/utils/thinking/policy.test.ts src/common/utils/ai/providerOptions.test.ts
  • make static-check

@ThomasK33
Copy link
Copy Markdown
Member Author

Addressed coder-agents-review round 2 findings:

  • DEREM-10: Added provider-options coverage for exact google:gemini-3-flash-preview with off mapping to Google thinkingLevel: "minimal".
  • DEREM-11: Added direct coverage that gemini-3.5-flash-lite is not classified by the Gemini Flash thinking-level chat-model predicate.
  • DEREM-13: Reworded the exported helper doc comment as a predicate contract and clarified that it expects a bare model name.
  • DEREM-14: Removed the single-entry Set and inlined the legacy exact-match condition so the matcher is easier to maintain.
  • DEREM-15: Added a source note to the Gemini 3.5 Flash metadata comment.

Validation rerun after these fixes:

  • bun test src/common/utils/thinking/policy.test.ts src/common/utils/ai/providerOptions.test.ts
  • make static-check

@ThomasK33
Copy link
Copy Markdown
Member Author

/coder-agents-review

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 70d6ea6272

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/common/utils/tokens/models-extra.ts Outdated
Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 of 6 R2 findings addressed (DEREM-10, 11, 13, 14, 15). One finding is unaddressed:

DEREM-12 (P3, policy.ts:53): getThinkingPolicyForModel JSDoc rules list omits the Gemini Flash branch entirely. The code has a dedicated Flash branch at line 112-114 returning ["off", "low", "medium", "high"], but the JSDoc only shows gemini-3 → ["low", "high"]. Someone reading the doc to understand Flash thinking levels gets the Pro policy instead.

Further review is blocked until DEREM-12 is addressed (fix or explicit response). The fix is a one-line JSDoc addition.

🤖 This review was automatically generated with Coder Agents.

@ThomasK33
Copy link
Copy Markdown
Member Author

Addressed the latest review feedback:

  • Codex cutoff feedback: verified the current official Google/DeepMind Gemini 3.5 Flash model info lists a January 2025 knowledge cutoff, so I changed knowledge_cutoff back to 2025-01 and clarified the metadata source comment.
  • DEREM-12: added the missing getThinkingPolicyForModel JSDoc rule for Gemini Flash chat variants returning ["off", "low", "medium", "high"], distinct from Gemini Pro variants returning ["low", "high"].

Validation rerun:

  • bun test src/common/utils/thinking/policy.test.ts src/common/utils/tokens/modelStats.test.ts
  • make static-check

@ThomasK33
Copy link
Copy Markdown
Member Author

/coder-agents-review

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. More of your lovely PRs please.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All prior findings addressed. DEREM-12 (the R3 blocker) is fixed. DEREM-3 (knowledge cutoff) is resolved in the author's favor: Google DeepMind's official model card at deepmind.google/models/gemini/flash/ lists "Knowledge cutoff: January 2025" for Gemini 3.5 Flash. The third-party aggregators (llm-stats.com, handyai, felloai) that reported January 2026 were wrong. The R1 finding was based on those aggregators; the author's reversion to "2025-01" based on the primary source was correct. Apologies for the churn.

R4 panel (8 reviewers): 5 found no new issues. New findings are 1 P3 and 2 Nits.

Pariston investigated the source conflict independently and reached the right conclusion: "Google DeepMind's own model card lists 'Knowledge cutoff: January 2025' for the 3.5 Flash row. The third-party aggregators appear to be inferring or copying from each other rather than from Google's published model card."

🤖 This review was automatically generated with Coder Agents.

Comment thread src/common/utils/ai/providerOptions.test.ts
Comment thread src/common/utils/thinking/policy.ts Outdated
Comment thread src/common/utils/thinking/policy.ts
@ThomasK33
Copy link
Copy Markdown
Member Author

Addressed coder-agents-review round 4 findings:

  • DEREM-16: Added coverage that unsupported Gemini 3.5 Flash max is defensively mapped to Google thinkingLevel: "high", matching the existing xhigh behavior.
  • DEREM-17: Updated the thinking policy JSDoc rule to say Gemini Flash chat variants instead of gemini-3 Flash chat variants.
  • DEREM-18: Added an @param note documenting that the helper expects a provider model ID without the provider prefix, with examples.

Validation rerun:

  • bun test src/common/utils/thinking/policy.test.ts src/common/utils/ai/providerOptions.test.ts
  • make static-check

@ThomasK33
Copy link
Copy Markdown
Member Author

/coder-agents-review

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7ed21f11db

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/common/utils/thinking/policy.ts Outdated
@ThomasK33
Copy link
Copy Markdown
Member Author

Addressed Codex feedback about non-preview Gemini 3 Flash IDs:

  • Extended the Flash thinking-level matcher to include gemini-3-flash and versioned non-preview forms such as gemini-3-flash-001, while retaining the Flash Lite exclusion.
  • Added policy tests for google:gemini-3-flash and google:gemini-3-flash-001.
  • Added provider-options coverage proving non-preview google:gemini-3-flash with off maps to Google thinkingLevel: "minimal".

Validation rerun:

  • bun test src/common/utils/thinking/policy.test.ts src/common/utils/ai/providerOptions.test.ts
  • make static-check

@ThomasK33
Copy link
Copy Markdown
Member Author

/coder-agents-review

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 769e8977d6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/common/utils/ai/providerOptions.ts Outdated
Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 18 findings from rounds 1-4 are addressed. R5 panel (6 reviewers): no new findings above Note.

One observation worth flagging: Hisoka noted that the old curated Flash (gemini-3-flash-preview) inherited max_pdf_size_mb: 30 from models.json, but the new gemini-3.5-flash entry in models-extra.ts does not include it. Three call sites guard on caps?.maxPdfSizeMb !== undefined and skip validation when absent, meaning oversized PDFs now pass the client and fail at the Google API instead of getting a clean rejection. If Gemini 3.5 Flash has the same 30MB ceiling, adding max_pdf_size_mb: 30 restores parity.

This PR is in good shape. Clean test coverage, proportional scope, all review findings addressed across 5 rounds.

🤖 This review was automatically generated with Coder Agents.

@ThomasK33
Copy link
Copy Markdown
Member Author

Addressed Codex feedback about namespaced Google model IDs:

  • Normalized the Google capability model name to a bare provider model ID before checking Gemini Flash thinking-level support.
  • Added coverage for google:models/gemini-3.5-flash with off mapping to Google thinkingLevel: "minimal".

Validation rerun:

  • bun test src/common/utils/ai/providerOptions.test.ts
  • make static-check

@ThomasK33
Copy link
Copy Markdown
Member Author

/coder-agents-review

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All prior findings addressed. R6 delta (18 lines) broadens old Flash detection to cover non-preview IDs (gemini-3-flash, gemini-3-flash-001). Netero and 5 of 6 panel reviewers found no issues. One P3 from Bisky: the new gemini-3-flash-lite exclusion guard lacks its own test (sibling of the gemini-3.5-flash-lite guard that DEREM-11 covered).

🤖 This review was automatically generated with Coder Agents.

Comment thread src/common/utils/thinking/policy.test.ts
@ThomasK33
Copy link
Copy Markdown
Member Author

Addressed coder-agents-review DEREM-19:

  • Added direct coverage that gemini-3-flash-lite is not classified as a Flash thinking-level chat model, alongside the existing gemini-3.5-flash-lite coverage.

Validation rerun:

  • bun test src/common/utils/thinking/policy.test.ts
  • make static-check

@ThomasK33
Copy link
Copy Markdown
Member Author

/coder-agents-review

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEREM-19 (P3) is unaddressed: gemini-3-flash-lite exclusion guard at policy.ts:37 has no test. The R7 commit addressed Codex's namespaced ID feedback but did not respond to DEREM-19.

Further review is blocked until DEREM-19 is addressed. The fix is one line:

expect(isGeminiFlashThinkingLevelModelName("gemini-3-flash-lite")).toBe(false);

🤖 This review was automatically generated with Coder Agents.

@ThomasK33
Copy link
Copy Markdown
Member Author

Clarifying DEREM-19: this is already addressed at PR head 644297fcd.

The requested assertion exists in src/common/utils/thinking/policy.test.ts:

expect(isGeminiFlashThinkingLevelModelName("gemini-3-flash-lite")).toBe(false);

It is in the test named does not classify Gemini Flash Lite variants as Flash thinking-level chat models, alongside the existing gemini-3.5-flash-lite assertion.

Validation after adding it:

  • bun test src/common/utils/thinking/policy.test.ts
  • make static-check

@ThomasK33
Copy link
Copy Markdown
Member Author

/coder-agents-review

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown

@coder-agents-review coder-agents-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 19 findings from rounds 1-6 are closed. DEREM-19 (the last open P3) is fixed. R8 panel (6 reviewers): zero new findings. Netero clean. 214 tests pass.

This PR is ready for human review and merge.

🤖 This review was automatically generated with Coder Agents.

@ThomasK33 ThomasK33 added this pull request to the merge queue May 20, 2026
Merged via the queue into main with commit 58a06c3 May 20, 2026
24 checks passed
@ThomasK33 ThomasK33 deleted the model-updates-f887 branch May 20, 2026 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant